AITopics | lookahead optimizer

Collaborating Authors

lookahead optimizer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lookahead Optimizer: k steps forward, 1 step back

Michael Zhang, James Lucas, Jimmy Ba, Geoffrey E. Hinton

Neural Information Processing SystemsFeb-12-2026, 22:55:59 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, lookahead, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing SystemsDec-25-2025, 17:28:20 GMT

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of ``fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

lookahead optimizer, name change, step forward, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Generalization and Optimization of SGD with Lookahead

Li, Kangcheng, Lei, Yunwen

arXiv.org Machine LearningSep-22-2025

The Lookahead optimizer enhances deep learning models by employing a dual-weight update mechanism, which has been shown to improve the performance of underlying optimizers such as SGD. However, most theoretical studies focus on its convergence on training data, leaving its generalization capabilities less understood. Existing generalization analyses are often limited by restrictive assumptions, such as requiring the loss function to be globally Lipschitz continuous, and their bounds do not fully capture the relationship between optimization and generalization. In this paper, we address these issues by conducting a rigorous stability and generalization analysis of the Lookahead optimizer with minibatch SGD. We leverage on-average model stability to derive generalization bounds for both convex and strongly convex problems without the restrictive Lipschitzness assumption. Our analysis demonstrates a linear speedup with respect to the batch size in the convex setting.

generalization, lookahead, stability, (15 more...)

arXiv.org Machine Learning

2509.15776

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Reviews: Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing SystemsJan-25-2025, 16:59:59 GMT

Update: I have read the author's response and have kept my score. Please note that in DeVries and Taylor'17, 'ResNet-18' is not truly the ResNet-18 model (it consists of 4 stages and has more than an order of magnitude more parameters than the original ResNet-18 due to wider channels). This should be made clear in the paper in order not to cause more confusion in the community. Originality: Medium/High The proposed algorithm is considerably different than recently proposed methods for deep learning, which gravitate towards adaptive gradient methods. It has some similarities to variance reduction algorithms with inner and outer loops, however Lookahead has a very simple outer loop structure and and is easy to implement.

lookahead optimizer, step back, step forward, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Lookahead Optimizer: k steps forward, 1 step back

Neural Information Processing SystemsOct-10-2024, 12:37:20 GMT

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost.

algorithm, lookahead optimizer, step forward, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.45)

Add feedback

Lookahead optimizer improves the performance of Convolutional Autoencoders for reconstruction of natural images

Nag, Sayan

arXiv.org Artificial IntelligenceDec-2-2020

Autoencoders are a class of artificial neural networks which have gained a lot of attention in the recent past. Using the encoder block of an autoencoder the input image can be compressed into a meaningful representation. Then a decoder is employed to reconstruct the compressed representation back to a version which looks like the input image. It has plenty of applications in the field of data compression and denoising. Another version of Autoencoders (AE) exist, called Variational AE (VAE) which acts as a generative model like GAN. Recently, an optimizer was introduced which is known as lookahead optimizer which significantly enhances the performances of Adam as well as SGD. In this paper, we implement Convolutional Autoencoders (CAE) and Convolutional Variational Autoencoders (CVAE) with lookahead optimizer (with Adam) and compare them with the Adam (only) optimizer counterparts. For this purpose, we have used a movie dataset comprising of natural images for the former case and CIFAR100 for the latter case. We show that lookahead optimizer (with Adam) improves the performance of CAEs for reconstruction of natural images.

autoencoder, lookahead optimizer, optimizer, (11 more...)

arXiv.org Artificial Intelligence

2012.05694

Country:

North America > Canada > Ontario > Toronto (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Genre: Research Report (0.65)

Industry:

Media > Film (0.53)
Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Don't look backwards, LookAhead!

#artificialintelligenceJun-1-2020, 03:38:40 GMT

The task of an optimizer is to look for such a set of weights for which a NN model yields the lowest possible loss. If you only had one weight and a loss function like the one depicted below you wouldn't have to be a genius to find the solution. Unfortunately you normally have a multitude of weights and a loss landscape that is hardly simple, not to mention no longer suited for a 2D drawing. Finding a minimum of such a function is no longer a trivial task. The most common optimizers like Adam or SGD require very time-consuming hyperparameter tuning and can get caught in the local minima.

artificial intelligence, machine learning, optimizer, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Lookahead Optimizer: k steps forward, 1 step back

Zhang, Michael, Lucas, James, Ba, Jimmy, Hinton, Geoffrey E.

Neural Information Processing SystemsMar-19-2020, 00:31:18 GMT

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights" generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost.

algorithm, lookahead optimizer, step forward, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.45)

Add feedback